545 research outputs found

    A novel symbolization technique for time-series outlier detection

    Get PDF
    The detection of outliers in time series data is a core component of many data-mining applications and broadly applied in industrial applications. In large data sets algorithms that are efficient in both time and space are required. One area where speed and storage costs can be reduced is via symbolization as a pre-processing step, additionally opening up the use of an array of discrete algorithms. With this common pre-processing step in mind, this work highlights that (1) existing symbolization approaches are designed to address problems other than outlier detection and are hence sub-optimal and (2) use of off-the-shelf symbolization techniques can therefore lead to significant unnecessary data corruption and potential performance loss when outlier detection is a key aspect of the data mining task at hand. Addressing this a novel symbolization method is motivated specifically targeting the end use application of outlier detection. The method is empirically shown to outperform existing approaches

    Housing Stakeholders Perspective on offsite manufactiring in Nigeria

    Get PDF
    Despite several mitigation attempts, Nigeria is still facing a deficit of 17 million houses. Seminal literature argues that this problem is predominantly due to a myriad of issues, including high construction costs, skills shortages, slow pace of construction, lack of infrastructure and logistics, poor quality of available housing stock etc. Offsite manufacturing has been proffered as an innovative method for addressing such issues. This paper reports on the findings of a feasibility study, which investigated the Nigerian stakeholders’ perceptions on the needs, promises and barriers of adopting offsite manufacturing in Nigeria. To achieve this, in-depth interviews were conducted with experts directly involved in housing delivery. Data gathered from the experts were analysed using exploratory thematic analysis. Nvivo software was used to transcribe and analyse research data. Findings from the in-depth interviews showed that the housing deficit in Nigeria is on the increase and nothing significant is being done at the moment. Stakeholders also posited that although OSM could improve housing delivery efforts in Nigeria, it is still considerably low; and this is as a result of a myriad of issues, such as negative local perception about OSM, client’s resistance, lack of infrastructure and skills shortage. This study concludes that for OSM to be adopted in Nigeria, there is a need for proper sensitisation, collaboration and encouragement from government. This study presents additional understanding of OSM in Nigeria based on expert opinion, the results of which will become a stepping-stone for the development of a roadmap for the adoption of OSM in Nigeria. It is proffered that adopting OSM can help support housing delivery efforts in Nigeria, and may also leverage wider benefits to the construction industry and associated supply chain

    Towards optimal symbolization for time series comparisons

    Get PDF
    The abundance and value of mining large time series data sets has long been acknowledged. Ubiquitous in fields ranging from astronomy, biology and web science the size and number of these datasets continues to increase, a situation exacerbated by the exponential growth of our digital footprints. The prevalence and potential utility of this data has led to a vast number of time-series data mining techniques, many of which require symbolization of the raw time series as a pre-processing step for which a number of well used, pre-existing approaches from the literature are typically employed. In this work we note that these standard approaches are sub-optimal in (at least) the broad application area of time series comparison leading to unnecessary data corruption and potential performance loss before any real data mining takes place. Addressing this we present a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets. We demonstrate empirically that our new approach provides a statistically significant reduction in the amount of error introduced by the symbolization process compared to current state-of-the-art. The approach therefore provides a more accurate input for the vast number of data mining techniques in the literature, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications

    Cross-system recommendation: user-modelling via social media versus self-declared preferences

    Get PDF
    It is increasingly rare to encounter a Web service that doesn’t engage in some form of automated recommendation, with Collaborative Filtering (CF) techniques being virtually ubiquitous as the means for delivering relevant content. Yet several key issues still remain unresolved, including optimal handling of cold starts and how best to maintain user privacy within that context. Recent work has demonstrated a potentially fruitful line of attack in the form of cross system user modelling, which uses features generated from one domain to bootstrap recommendations in another. In this paper we evidence the effectiveness of this approach through direct real-world user feedback, deconstructing a cross-system news recommendation service where user models are generated via social media data. It is shown that even when a relatively naive vector-space approach is used, it is possible to automatically generate user-models that provide statistically superior performance than when items are explicitly filtered based on a user’s self-declared preferences. Detailed qualitative analysis of why such effects occur indicate that different models are capturing widely different areas within a user’s preference space, and that hybrid models represent fertile ground for future research

    Event series prediction via non-homogeneous Poisson process modelling

    Get PDF
    Data streams whose events occur at random arrival times rather than at the regular, tick-tock intervals of traditional time series are increasingly prevalent. Event series are continuous, irregular and often highly sparse, differing greatly in nature to the regularly sampled time series traditionally the concern of hard sciences. As mass sets of such data have become more common, so interest in predicting future events in them has grown. Yet repurposing of traditional forecasting approaches has proven ineffective, in part due to issues such as sparsity, but often due to inapplicable underpinning assumptions such as stationarity and ergodicity. In this paper we derive a principled new approach to forecasting event series that avoids such assumptions, based upon: 1. the processing of event series datasets in order to produce a parameterized mixture model of non-homogeneous Poisson processes; and 2. application of a technique called parallel forecasting that uses these processes’ rate functions to directly generate accurate temporal predictions for new query realizations. This approach uses forerunners of a stochastic process to shed light on the distribution of future events, not for themselves, but for realizations that subsequently follow in their footsteps

    Post-Harvest to Consumer Driver Review of the Aquatic Supply Chain

    Get PDF
    This paper provides an overview of the key current features of the international markets for aquatic food and appraises how the future drivers of the post-harvest/consumption aspects of the value chain will interact. This encompasses product from both wild capture fisheries and aquaculture. Here, ‘post-harvest’ covers all those activities involved in delivering aquatic products from the water to the plate, in particular those concerned with their processing and trading. The system is highly diverse, and a wide range of aquatic species and products, changing patterns of demand and supply, in a spectrum of cultural, economic and political contexts give rise to a variety of post-harvest configurations and future directions. The only constant across the sector is product perishability, at higher rates than for most terrestrial foods, and a critical element to be managed if values are to be delivered without loss and if the potential for adding value can be realised

    A refined limit on the predictability of human mobility

    Get PDF
    It has been recently claimed that human movement is highly predictable. While an upper bound of 93% predictability was shown, this was based upon human movement trajectories of very high spatiotemporal granularity. Recent studies reduced this spatiotemporal granularity down to the level of GPS data, and under a similar methodology results once again suggested a high predictability upper bound (i.e. 90% when movement was quantized down to a spatial resolution approximately the size of a large building). In this work we reconsider the derivation of the upper bound to movement predictability. By considering real-world topological constraints we are able to achieve a tighter upper bound, representing a more refined limit to the predictability of human movement. Our results show that this upper bound is between 11-24% less than previously claimed at a spatial resolution of approx. 100m_100m, with a greater improvement for finer spatial resolutions. This indicates that human mobility is potentially less predictable than previously thought. We provide an in-depth examination of how varying the spatial and temporal quantization affects predictability, and consider the impact of corresponding limits using a large set of real-world GPS traces. Particularly at fine-grained spatial quantizations where a significant number of practical applications lie, these new (lower) upper limits raise serious questions about the use of location information alone for prediction, contributing more evidence that such prediction must integrate external variables

    Mobile capture of remote points of interest using line of sight modelling

    Get PDF
    Recording points of interest using GPS whilst working in the field is an established technique in geographical fieldwork, where the user’s current position is used as the spatial reference to be captured; this is known as geo-tagging. We outline the development and evaluation of a smartphone application called Zapp that enables geo-tagging of any distant point on the visible landscape. The ability of users to log or retrieve information relating to what they can see, rather than where they are standing, allows them to record observations of points in the broader landscape scene, or to access descriptions of landscape features from any viewpoint. The application uses the compass orientation and tilt of the phone to provide data for a line of sight algorithm that intersects with a Digital Surface Model stored on the mobile device. We describe the development process and design decisions for Zapp present the results of a controlled study of the accuracy of the application, and report on the use of Zapp for a student field exercise. The studies indicate the feasibility of the approach, but also how the appropriate use of such techniques will be constrained by current levels of precision in mobile sensor technology. The broader implications for interactive query of the distant landscape and for remote data logging are discussed
    • …
    corecore